Mapping Tree Diversity: Analyzing Tree Traits Across Vancouver’s Neighborhoods#
This exploratory data analysis investigates spatial and geographical patterns in the Vancouver Street Trees dataset. Specifically, it explores how trees are distributed across neighborhoods in terms of abundance, size, and species diversity.
Urban trees contribute significantly to environmental and social well-being in cities—they provide shade, improve air quality, reduce urban heat, and enhance neighborhood livability. Understanding how tree characteristics vary between neighborhoods can support more equitable and effective urban planning, sustainability efforts, and biodiversity initiatives.
This analysis draws on a subset of the full dataset, containing 5,000 entries. It focuses on key features such as neighbourhood_name, diameter, genus_name, and species_name to identify patterns and trends in Vancouver’s urban forest.
Questions of Interest#
Which neighborhoods have the highest and lowest number of trees?
Is there a relationship between neighborhood and average tree diameter (or height range)?
Are certain neighborhoods dominated by specific genera?
Analysis#
We’ll start by loading the necessary libraries and reading in the Vancouver Street Trees dataset.
Summarizing the Data#
Now we’ll go ahead and review the structure and summary statistics of the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 5000 non-null int64
1 std_street 5000 non-null object
2 on_street 5000 non-null object
3 species_name 5000 non-null object
4 neighbourhood_name 5000 non-null object
5 date_planted 2363 non-null object
6 diameter 5000 non-null float64
7 street_side_name 5000 non-null object
8 genus_name 5000 non-null object
9 assigned 5000 non-null object
10 civic_number 5000 non-null int64
11 plant_area 4950 non-null object
12 curb 5000 non-null object
13 tree_id 5000 non-null int64
14 common_name 5000 non-null object
15 height_range_id 5000 non-null int64
16 on_street_block 5000 non-null int64
17 cultivar_name 2658 non-null object
18 root_barrier 5000 non-null object
19 latitude 5000 non-null float64
20 longitude 5000 non-null float64
dtypes: float64(3), int64(5), object(13)
memory usage: 820.4+ KB
| Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 5000.000000 | 5000 | 5000 | 5000 | 5000 | 2363 | 5000.000000 | 5000 | 5000 | 5000 | ... | 4950 | 5000 | 5000.000000 | 5000 | 5000.00000 | 5000.000000 | 2658 | 5000 | 5000.000000 | 5000.000000 |
| unique | NaN | 603 | 607 | 171 | 22 | 1599 | NaN | 4 | 67 | 2 | ... | 38 | 2 | NaN | 361 | NaN | NaN | 176 | 2 | NaN | NaN |
| top | NaN | CAMBIE ST | CAMBIE ST | SERRULATA | Renfrew-Collingwood | 2004-02-16 | NaN | ODD | ACER | N | ... | 10 | Y | NaN | KWANZAN FLOWERING CHERRY | NaN | NaN | KWANZAN | N | NaN | NaN |
| freq | NaN | 52 | 49 | 463 | 384 | 7 | NaN | 2554 | 1218 | 4564 | ... | 736 | 4593 | NaN | 383 | NaN | NaN | 383 | 4679 | NaN | NaN |
| mean | 14861.920400 | NaN | NaN | NaN | NaN | NaN | 12.340888 | NaN | NaN | NaN | ... | NaN | NaN | 128682.584600 | NaN | 2.73440 | 2960.227000 | NaN | NaN | 49.247349 | -123.107128 |
| std | 8680.023278 | NaN | NaN | NaN | NaN | NaN | 9.266600 | NaN | NaN | NaN | ... | NaN | NaN | 75412.260406 | NaN | 1.56957 | 2086.861052 | NaN | NaN | 0.021251 | 0.049137 |
| min | 2.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | ... | NaN | NaN | 36.000000 | NaN | 0.00000 | 0.000000 | NaN | NaN | 49.202783 | -123.220560 |
| 25% | 7192.750000 | NaN | NaN | NaN | NaN | NaN | 4.000000 | NaN | NaN | NaN | ... | NaN | NaN | 61321.500000 | NaN | 2.00000 | 1300.000000 | NaN | NaN | 49.230152 | -123.144178 |
| 50% | 14870.000000 | NaN | NaN | NaN | NaN | NaN | 10.000000 | NaN | NaN | NaN | ... | NaN | NaN | 130130.500000 | NaN | 2.00000 | 2600.000000 | NaN | NaN | 49.247981 | -123.105861 |
| 75% | 22366.750000 | NaN | NaN | NaN | NaN | NaN | 18.000000 | NaN | NaN | NaN | ... | NaN | NaN | 191332.000000 | NaN | 4.00000 | 4100.000000 | NaN | NaN | 49.263275 | -123.063484 |
| max | 29992.000000 | NaN | NaN | NaN | NaN | NaN | 71.000000 | NaN | NaN | NaN | ... | NaN | NaN | 270750.000000 | NaN | 9.00000 | 9100.000000 | NaN | NaN | 49.293930 | -123.023311 |
11 rows × 21 columns
The dataset contains 5,000 rows and 21 columns. It provides details on street trees in Vancouver, including their genus, species, diameter, height range, and the neighborhood in which they are located.
Although the dataset includes many attributes, this analysis will focus on the columns most relevant to answering the research questions:
genus_name: Used to compare tree distributions across different parts of Vancouver.neighbourhood_name: Helps identify patterns in tree dominance and biodiversity in neighborhoods.diameter: Can act as a rough estimate for tree maturity.height_range_id: General classification of tree size, useful for understanding tree growth variation by location.
Columns with extensive missing data, such as date_planted and cultivar_name, will be excluded from further analysis to ensure data quality and maintain focus on key variables.
Figure 1: Tree Height Distribution by Neighborhood (Top 10)#
This faceted histogram visualizes the distribution of tree heights across the top 10 neighborhoods with the highest number of street trees in Vancouver. Each facet represents one neighborhood, allowing for easy comparison of tree height diversity across these areas.
By focusing on neighborhoods with the most trees, we can identify which areas tend to have taller or shorter trees on average, and observe the variation in tree size distribution. This provides insights into the structural diversity of Vancouver’s urban forest in its most populated green areas.
Figure 2: Distribution of Tree Counts Among Vancouver’s Top 10 Neighborhoods#
The plot below shows which neighborhoods have the most or fewest trees out of the top_trees_df. This helps clarify tree distribution city-wide and helps explore areas such as species diversity further.
Figure 3: Tree Count by Neighborhood#
We can easily replace the top_trees_df with trees_df to get a sense of tree distribution across the city, making it easier to identify both high and low tree count neighborhoods.
Figure 3: Tree Diameter Distribution Across Top 10 Neighborhoods#
The boxplot created below explores how tree sizes vary across neighborhoods by capturing medians, spread, and outliers. These patterns may reflect differences in neighborhood development timelines or maintenance practices.
Figure 4: Tree Genus Concentration by Neighborhood#
To explore whether certain neighborhoods are dominated by specific tree genera, the heatmap below shows the count of each genus across the top neighborhoods. This way, we can compare two categorical variables (neighborhood and genus) and visualize frequency patterns. Genera with fewer than 30 trees are grouped into a new category labeled “Other” to reduce clutter and improve readability.
C:\Users\klkro\AppData\Local\Temp\ipykernel_10048\1603626241.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
top_trees_df.loc[:, 'genus_grouped'] = top_trees_df['genus_name'].apply(
Discussion#
This analysis explored how Vancouver’s street trees vary across neighborhoods in terms of abundance, size, and species diversity. Several key patterns emerged:
Certain neighborhoods, such as Renfrew-Collingwood and Kensington-Cedar Cottage, have noticeably higher numbers of street trees. This raises interesting questions about the factors influencing tree distribution, including neighborhood size, development history, urban planning practices, and the relative age or maturity of these areas.
Some neighborhoods show a wider spread or higher median tree diameters, pointing to older, more established tree populations. Conversely, the presence of extreme outliers suggests variation in tree maturity, which could inform maintenance or replacement priorities.
In terms of genus diversity, certain areas are dominated by a single genus like Acer, while others host a broader mix. These patterns may reflect municipal planting strategies or environmental factors like sunlight, soil, and space availability.
Overall, the findings generally aligned with expectations, but they also highlighted new questions about the historical and environmental influences on Vancouver’s urban forest. Further analysis could incorporate data on neighborhood development, proximity to green spaces, or socioeconomic factors to better understand what drives these patterns — and how they relate to broader issues like urban planning, environmental equity, and community resilience.
Concluding Remarks#
This analysis revealed meaningful variations in Vancouver’s urban tree population across neighborhoods, in terms of quantity, size, and species composition. While some neighborhoods have abundant and mature trees, others show more diversity or potential maintenance needs. These insights contribute to understanding the ecological and planning dynamics shaping the city’s urban forest.
The findings not only confirm expected patterns but also open avenues for further exploration, especially by integrating additional data sources like real estate values or neighborhood quality. Such work could help inform more equitable and resilient urban greening strategies moving forward.
Dashboard#
The dashboard below was created to showcase the relationship between tree genus concentration and tree count distribution in Vancouver’s top 10 neighborhoods. Users can interact with the dashboard by selecting a neighborhood directly from the bar chart, which filters the heatmap below to show the distribution of genera within that area. Additionally, a genus dropdown allows users to focus on specific tree types, revealing patterns in where certain genera are planted. This dual-filter approach helps highlight variations in tree diversity across neighborhoods.
With more time, I would integrate real estate data, such as average property values or rental rates by neighborhood, to explore whether there’s a correlation between urban tree diversity and housing economics. This could provide deeper insight into how ecological investment and biodiversity intersect with affordability, equity, or gentrification in Vancouver.
References#
Not all of the work in this notebook is original. Some techniques and ideas were informed by publicly available resources and course materials. These elements were used solely for educational purposes.
Resources Used#
• Data Source – The cleaned and filtered Vancouver Street Trees dataset (5,000 rows) was provided by instructors of the “Data Visualization” course through the University of British Columbia (UBC) Key Capabilities in Data Science certificate. This data source is a subset of the original data obtained from the City of Vancouver Open Data Portal under the Open Government License – Vancouver.
• Data Wrangling Approach – The data cleaning and transformation steps, specifically filtering for a new dataframe, were guided by methods learned in the “Programming in Python” course from the Key Capabilities in Data Science certificate at UBC.
• Data Visualization – Visualizations were created by me using Altair, guided by course materials from the “Data Visualization” course within UBC Key Capabilities in Data Science certificate.
• Attribution – Portions of code structuring, explanation clarity, and troubleshooting were assisted through conversational support with ChatGPT, an AI language model by OpenAI. All final decisions, implementations, and interpretations were completed independently.